Text-Attentional Convolutional Neural Networks for Scene Text Detection
نویسندگان
چکیده
Recent deep learning models have demonstrated strong capabilities for classifying text and non-text components in natural images. They extract a high-level feature computed globally from a whole image component (patch), where the cluttered background information may dominate true text features in the deep representation. This leads to less discriminative power and poorer robustness. In this work, we present a new system for scene text detection by proposing a novel Text-Attentional Convolutional Neural Network (Text-CNN) that particularly focuses on extracting text-related regions and features from the image components. We develop a new learning mechanism to train the Text-CNN with multi-level and rich supervised information, including text region mask, character label, and binary text/nontext information. The rich supervision information enables the Text-CNN with a strong capability for discriminating ambiguous texts, and also increases its robustness against complicated background components. The training process is formulated as a multi-task learning problem, where low-level supervised information greatly facilitates main task of text/non-text classification. In addition, a powerful low-level detector called ContrastEnhancement Maximally Stable Extremal Regions (CE-MSERs) is developed, which extends the widely-used MSERs by enhancing intensity contrast between text patterns and background. This allows it to detect highly challenging text patterns, resulting in a higher recall. Our approach achieved promising results on the ICDAR 2013 dataset, with a F-measure of 0.82, improving the state-of-the-art results substantially.
منابع مشابه
Smart Library: Identifying Books in a Library using Richly Supervised Deep Scene Text Reading
Physical library collections are valuable and long standing resources for knowledge and learning. However, managing books in a large bookshelf and finding books on it often leads to tedious manual work, especially for large book collections where books might be missing or misplaced. Recently, deep neural models, such as Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) hav...
متن کاملRobust Text Reading in Natural Scene Images
In this paper, we consider applying multilayer, convolutional neural networks to construct a complete end-to-end text recognition system with performance comparable to the current state-of-the-art. Such a system demonstrates the possibility of using a single, unified architecture for both text detection and recognition with minimal need for elaborate hand-engineering of features.
متن کاملAccurate Text Localization in Natural Image with Cascaded Convolutional Text Network
We introduce a new top-down pipeline for scene text detection. We propose a novel Cascaded Convolutional Text Network (CCTN) that joints two customized convolutional networks for coarse-to-fine text localization. The CCTN fast detects text regions roughly from a low-resolution image, and then accurately localizes text lines from each enlarged region. We cast previous character based detection i...
متن کاملIntegrating Scene Text and Visual Appearance for Fine-Grained Image Classification with Convolutional Neural Networks
Text in natural images contains rich semantics that are often highly relevant to objects or scene. In this paper, we focus on the problem of fully exploiting scene text for visual understanding. The main idea is combining word representations and deep visual features into a globally trainable deep convolutional neural network. First, the recognized words are obtained by a scene text reading sys...
متن کاملEnd-to-end Text Recognition with Convolutional Neural Networks an Honors Thesis Submitted to the Department of Computer Science of Stanford University
Full end-to-end text recognition in natural images is a challenging problem that has recently received much attention in computer vision and machine learning. Traditional systems in this area have relied on elaborate models that incorporate carefully hand-engineered features or large amounts of prior knowledge. In this thesis, I describe an alternative approach that combines the representationa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1510.03283 شماره
صفحات -
تاریخ انتشار 2015